Building species trees from larger parts of phylogenomic databases
نویسندگان
چکیده
Gene trees are leaf-labeled trees inferred from molecular sequences. Due to duplication events arising in genome evolution, gene trees usually have multiple copies of some labels, i.e., species. Inferring a species tree from a set of multi-labeled gene trees (MUL trees) is a wellknown problem in computational biology. We propose a novel approach to tackle this problem, mainly to transform a collection of MUL trees into a collection of evolutionary trees, each containing single copies of labels. To that aim, we provide several algorithmic building stones and describe how they fit within a general species tree inference process. First of all, we propose to separately preprocess MUL trees in order to remove their redundant parts with respect to speciation events. For this purpose, we present a tree isomorphism algorithm for MUL trees that can be applied to the pairs of subtrees hanging from duplication nodes. This preprocess lowers the number of duplication nodes in gene trees. For the gene trees that still have duplication nodes, we define the topological information of a MUL tree that can be thought of as being unambiguously related to speciation events. When the MUL tree contains a coherent speciation signal, we show that we can replace the MUL tree with a single-labeled tree representing its speciation information. Otherwise, we propose to extract a maximum subtree that is free of duplication events. Most algorithms have a linear-time complexity, except for an FPT algorithm proposed for a problem that we show to be intractable. The algorithms described in this paper are used to analyse the hogenom database, a database of homologous genes from fully sequenced genomes .
منابع مشابه
Conceptual framework and pilot study to benchmark phylogenomic databases based on reference gene trees
Phylogenomic databases provide orthology predictions for species with fully sequenced genomes. Although the goal seems well-defined, the content of these databases differs greatly. Seven ortholog databases (Ensembl Compara, eggNOG, HOGENOM, InParanoid, OMA, OrthoDB, Panther) were compared on the basis of reference trees. For three well-conserved protein families, we observed a generally high sp...
متن کاملPhylogenomic Reconstruction of the Oomycete Phylogeny Derived from 37 Genomes
The oomycetes are a class of microscopic, filamentous eukaryotes within the Stramenopiles-Alveolata-Rhizaria (SAR) supergroup which includes ecologically significant animal and plant pathogens, most infamously the causative agent of potato blight Phytophthora infestans. Single-gene and concatenated phylogenetic studies both of individual oomycete genera and of members of the larger class have r...
متن کاملEstimating Optimal Species Trees from Incomplete Gene Trees Under Deep Coalescence
The estimation of species trees typically involves the estimation of trees and alignments on many different genes, so that the species tree can be based on many different parts of the genome. This kind of phylogenomic approach to species tree estimation has the potential to produce more accurate species tree estimates, especially when gene trees can differ from the species tree due to processes...
متن کاملManaging and analyzing phylogenetic databases
The ever growing availability of phylogenomic data makes it increasingly possible to study and analyze phylogenetic relationships across a wide range of species. Indeed, current phylogenetic analyses are now producing enormous collections of trees that vary greatly in size. Our proposed research addresses the challenges posed by storing, querying, and analyzing such phylogenetic databases. Our ...
متن کاملCoalescent-Based Genome Analyses Resolve the Early Branches of the Euarchontoglires
Despite numerous large-scale phylogenomic studies, certain parts of the mammalian tree are extraordinarily difficult to resolve. We used the coding regions from 19 completely sequenced genomes to study the relationships within the super-clade Euarchontoglires (Primates, Rodentia, Lagomorpha, Dermoptera and Scandentia) because the placement of Scandentia within this clade is controversial. The d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Inf. Comput.
دوره 209 شماره
صفحات -
تاریخ انتشار 2011